SQL practice session

To start off, re-write the code to extract the 5GB file to a database. You can look at my code if you get stuck, but dont copy paste. Look at the code, but type it in yourself.

  • Make sure you delete the old database file, or rename it

I recommend you open a new Jupyter sheet to run your code

Click on File->New Notebook -> Python3

Next: Some practice of sql queries


In [8]:
# import sqlite3 here

In [21]:
#open connection to database

In [10]:
# 1st challenge: Write a sql query to search for the name: zoidberg

# Note: It will return 0


[]

It returned empty because there is no zoidberg in our list.

Add it below.

Name: Zoidberg

Legal name: Planet Express

City: New New York

State: New New York


In [1]:
# Add the zoidberg data to the database below. remeber to commit()

In [12]:
# Search for zoidberg again. This time, you should get the results below:


[('Zoidberg', 'Planet Express', 'New New York', 'New New York')]

Next: Compare the number of practices in New york (NY) vs Texas (Tx)


In [13]:
# Count number of practices in New York vs Texas

# 1st, get number in New York. You should get the values below

print("Number in NY: ", len(result))


Number in NY:  338225

In [5]:
# Now get Texas: 

print("Number in TX: ", len(result))


Number in TX:  230430

Next: Find the number of people with John in their name


In [25]:
# Find number of Johns. Remember, this uses the % symbol


49360

There is a problem with the above: It will include names like Johnathan.

What if we only want people with the name John?

Hint: In the search query, use 'john %'. Notice the space before %

Try that below:


In [23]:
print(len(result))

#This time, printing the 1st 6 results as well, to check.
print(result[:6])


47583
[('JOHN CHU-HONG CHANG', '', 'PEARL RIVER', 'NY'), ('JOHN S PULIZZI', '', 'ALEXANDRIA', 'VA'), ('JOHN A GARCIA', '', 'SANTA FE', 'NM'), ('JOHN DAVID SHEPHERD', '', 'PINEHURST', 'NC'), ('JOHN PATRICK FOGARTY', '', 'GREENVILLE', 'NC'), ('JOHN  DAY', '', 'MARIETTA', 'GA')]

In [27]:
# Now find all Johns in the state 'AL'

print(len(result))
print(result[:6])


692
[('JOHN CRAIG ROMER', '', 'HUNTSVILLE', 'AL'), ('JOHN ALVA LONG', '', 'BIRMINGHAM', 'AL'), ('JOHN F JOHNSON', '', 'MOBILE', 'AL'), ('JOHN MICHAEL JONES', '', 'HUNTSVILLE', 'AL'), ('JOHN P. MOORE', '', 'DOTHAN', 'AL'), ('JOHN C. BULLINGTON', '', 'BIRMINGHAM', 'AL')]

In [28]:
# Finally, Johns in AL and city of Mobile.

print(len(result))
print(result[:6])


63
[('JOHN F JOHNSON', '', 'MOBILE', 'AL'), ('JOHN WILLIAM BOYER', '', 'MOBILE', 'AL'), ('JOHN EVERETT STONE', '', 'MOBILE', 'AL'), ('JOHN C MCANDREW', '', 'MOBILE', 'AL'), ('JOHN E SEMON', '', 'MOBILE', 'AL'), ('JOHN RAYMOND SANDS', '', 'MOBILE', 'AL')]

In [29]:
#Always close the database!

conn.close()

In [ ]: